-
Couldn't load subscription status.
- Fork 928
Protect UCX endpoints against concurrent creation #12863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If multiple threads initialize the same destination ep simultaneously we endup in situation where we either segfault or we use resources that we will never release (because they are overwritten by another thread). Thus, protect the ep creation with an atomic update, allowing a single thread to create the ep for a particular peer. Fixes open-mpi#12833. Signed-off-by: George Bosilca <[email protected]>
|
FWIW: I will be fixing the segfault as that shouldn't happen. What remains unexplained is why the key is not being found, given that it was "put" prior to the "fence" in MPI_Init and circulated during that operation. This to me is very disturbing. Were you able to track down a reason for it? |
| if (!OPAL_ATOMIC_COMPARE_EXCHANGE_STRONG_PTR((opal_atomic_intptr_t *)&proc->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_PML], | ||
| &ep, (void*)1)) { | ||
| /* give some slack to the other thread to create the endpooint */ | ||
| while (!PML_UCX_IS_VALID_ENDPOINT(proc)) { | ||
| _opal_lifo_release_cpu(); | ||
| opal_atomic_rmb(); | ||
| } | ||
| goto check_again; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we use a simple lock? this is a slow-path function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a proc-level lock would create the need for an UCX level proc struct instead of using the default ompi_proc_t. A lot of work for little benefit. And having a global lock would be way too inefficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think a global lock is good enough, given that mca_pml_ucx_recv_worker_addres/pmix and ucp_ep_create would probably also use global locks
|
@rhc54 I have not looked into details on why |
|
Yeah, it's puzzling. Clearly, we do find the key when OMPI is single-threaded, so we know it is being properly published and circulated (otherwise nothing would work), and properly retrieved in that situation. Somehow, when OMPI is multi-threaded, we cannot find the key - which is really strange as It's almost like somehow you are getting to the ep formation point before we complete the fence and the data has been stored where |
|
Ahhh...I see what's going on. You might not need this after all - problem lies in PMIx. Put simply, when we do a "fence", we cache the collected data at the server. This helps control the memory footprint as not every piece of data will be required by every local client. The first time you ask for a piece of data for a particular proc, we return all the data "put" by that proc since it is likely you'll be asking for more keys. So you do indeed "not found" on the first call to PMIx_Get - and subsequent calls need to be cached until the server can respond. In this multi-threaded case, you generate another request before the server's response is received - and that hits a bug in PMIx. I'll get that fixed and it should resolve the observed issue. Totally up to you if you want to retain this serialization - I'm not sure if there are negative repercussions and/or what impact it might have. |
|
Okay, one line fix - adding it to PMIx master branch now. You can track it here: openpmix/openpmix#3415 |
|
Thanks for your insight, after trying with your patch I was not able to replicate the initial issue. Thus, with the serialization in |
If multiple threads initialize the same destination ep simultaneously we endup in situation where we either segfault or we use resources that we will never release (because they are overwritten by another thread). Thus, protect the ep creation with an atomic update, allowing a single thread to create the ep for a particular peer.
Fixes #12833.